An investigation of cepstral parameterisations for large vocabulary speech recognition
نویسندگان
چکیده
We examined variants of MFCC and PLP cepstral parameterisations in the context of large vocabulary continuous speech recognition under di erent acoustical environmental conditions: Compared to MFCC, mel-frequency PLP uses a cubic root intensity-toloudness law, and an LPC analysis is applied to the mel-warped spectrum. In LPC-smoothed MFCC, the only di erence to MFCC is the additional LPC smoothing of the warped spectrum. While neither technique was able to signi cantly outperform the MFCC parameterisation in our setup which includes an LDA feature transformation, feature set combination via DMC at the acoustic likelihood level and via ROVER at the recognized word level delivered small but consistent improvements.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملTowards missing data recognition with cepstral features
We study in this work the Missing Data Recognition (MDR) framework applied to a large vocabulary continuous speech recognition (LVCSR) task with cepstral models when the speech signal is corrupted by musical noise. We do not propose a full system that solves this difficult problem, but we rather present some of the issues involved and study some possible solutions to them. We focus in this work...
متن کاملThe use of cepstral means in conversational speech recognition
Environmental robustness and speaker independence are import issues of current speech recognition research. Channel and speaker adaptation methods do the best job when the adaption is done towards a normalized acoustic model. Normalization methods might make use of the model but primarily inuence the signal such that important information is kept and unwanted distortions are cancelled out. Most...
متن کاملRegularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition
In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high var...
متن کامل